📚 node [[hashing|hashing]]
Welcome! Nobody has contributed anything to 'hashing|hashing' yet. You can:
  • Write something in the document below!
    • There is at least one public document in every node in the Agora. Whatever you write in it will be integrated and made available for the next visitor to read and edit.
  • Write to the Agora from social media.
    • If you follow Agora bot on a supported platform and include the wikilink [[hashing|hashing]] in a post, the Agora will link it here and optionally integrate your writing.
  • Sign up as a full Agora user.
    • As a full user you will be able to contribute your personal notes and resources directly to this knowledge commons. Some setup required :)
⥅ related node [[hashing function]]
⥅ related node [[hashing]]
⥅ node [[hashing]] pulled by Agora

hashing

Go back to the [[AI Glossary]]

In machine learning, a mechanism for bucketing categorical data, particularly when the number of categories is large, but the number of categories actually appearing in the dataset is comparatively small.

For example, Earth is home to about 60,000 tree species. You could represent each of the 60,000 tree species in 60,000 separate categorical buckets. Alternatively, if only 200 of those tree species actually appear in a dataset, you could use hashing to divide tree species into perhaps 500 buckets.

A single bucket could contain multiple tree species. For example, hashing could place baobab and red maple—two genetically dissimilar species—into the same bucket. Regardless, hashing is still a good way to map large categorical sets into the desired number of buckets. Hashing turns a categorical feature having a large number of possible values into a much smaller number of values by grouping values in a deterministic way.

For more information on hashing, see the Feature Columns chapter in the TensorFlow Programmers Guide.

⥅ node [[hashing-function]] pulled by Agora
📖 stoas
⥱ context